19 research outputs found

    Efficient, Scalable, and Accurate Program Fingerprinting in Binary Code

    Get PDF
    Why was this binary written? Which compiler was used? Which free software packages did the developer use? Which sections of the code were borrowed? Who wrote the binary? These questions are of paramount importance to security analysts and reverse engineers, and binary fingerprinting approaches may provide valuable insights that can help answer them. This thesis advances the state of the art by addressing some of the most fundamental problems in program fingerprinting for binary code, notably, reusable binary code discovery, fingerprinting free open source software packages, and authorship attribution. First, to tackle the problem of discovering reusable binary code, we employ a technique for identifying reused functions by matching traces of a novel representation of binary code known as the semantic integrated graph. This graph enhances the control flow graph, the register flow graph, and the function call graph, key concepts from classical program analysis, and merges them with other structural information to create a joint data structure. Second, we approach the problem of fingerprinting free open source software (FOSS) packages by proposing a novel resilient and efficient system that incorporates three components. The first extracts the syntactical features of functions by considering opcode frequencies and performing a hidden Markov model statistical test. The second applies a neighborhood hash graph kernel to random walks derived from control flow graphs, with the goal of extracting the semantics of the functions. The third applies the z-score to normalized instructions to extract the behavior of the instructions in a function. Then, the components are integrated using a Bayesian network model which synthesizes the results to determine the FOSS function, making it possible to detect user-related functions. Third, with these elements now in place, we present a framework capable of decoupling binary program functionality from the coding habits of authors. To capture coding habits, the framework leverages a set of features that are based on collections of functionalityindependent choices made by authors during coding. Finally, it is well known that techniques such as refactoring and code transformations can significantly alter the structure of code, even for simple programs. Applying such techniques or changing the compiler and compilation settings can significantly affect the accuracy of available binary analysis tools, which severely limits their practicability, especially when applied to malware. To address these issues, we design a technique that extracts the semantics of binary code in terms of both data and control flow. The proposed technique allows more robust binary analysis because the extracted semantics of the binary code is generally immune from code transformation, refactoring, and varying the compilers or compilation settings. Specifically, it employs data-flow analysis to extract the semantic flow of the registers as well as the semantic components of the control flow graph, which are then synthesized into a novel representation called the semantic flow graph (SFG). We evaluate the framework on large-scale datasets extracted from selected open source C++ projects on GitHub, Google Code Jam events, Planet Source Code contests, and students’ programming projects and found that it outperforms existing methods in several respects. First, it is able to detect the reused functions. Second, it can identify FOSS packages in real-world projects and reused binary functions with high precision. Third, it decouples authorship from functionality so that it can be applied to real malware binaries to automatically generate evidence of similar coding habits. Fourth, compared to existing research contributions, it successfully attributes a larger number of authors with a significantly higher accuracy. Finally, the new framework is more robust than previous methods in the sense that there is no significant drop in accuracy when the code is subjected to refactoring techniques, code transformation methods, and different compilers

    QoS based Route Management in Cognitive Radio Networks

    Get PDF
    Cognitive radio networks are smart networks that automatically sense the channel and adjust the network parameters accordingly. Cognitive radio is an emerging technology that enables the dynamic deployment of highly adaptive radios that are built upon software defined radio technology. The radio technology allows the unlicensed operation to be in the licensed band. The cognitive radio network paradigm therefore raises many technical challenges such as the power efficiency, spectrum management, spectrum detection, environment awareness, the path selection as well as the path robustness, and security issues. Traditionally, in the routing approaches in the wired network, each node allows a maximum load through the selected route while traditionally in the routing approaches in wireless network, each node broadcasts its request with the identification of the required destination. However, the existing routing approaches in cognitive radio networks (CRN) follow the traditional approaches in wireless network especially those applied for ad hoc networks. In addition, these traditional approaches do not take into account spectrum trading as well as spectrum competition among licensed users (PUs). In this thesis, a novel QoS based route management approach is proposed by introducing two different models; the first model is without game theory and the second model is with game theory. The proposed QoS routing algorithm contains the following elements: (i) a profile for each user, which contains different parameters such as the unlicensed user (secondary user, SU) identification, number of neighbors, channel identification, neighbor identification, probabilities of idle slots and the licensed user (primary user, PU) presence. In addition, the radio functionality feature for CRN nodes gives the capability to sense the channels and therefore each node shares its profile with the sensed PU, which then exchanges its profile with other PUs, (ii) spectrum trading, a PU calculates its price based on the SU requirements, (iii) spectrum competition, a new coefficient α is defined that controls the price because of competition among PUs and depends on many factors such as the number of primary users, available channels, and duration of the usage, (iv) a new function called QoS function is defined to provide different levels of quality of service to SUs, and (v) the game theory concept adds many features such as the flexibility, the dynamicity in finding solutions to the model and the dynamic behaviors of users. Based on the previous elements, all possible paths are managed and categorized based on the level of QoS requested by SUs and the price offered by the PU. The simulation results show that the aggregate throughput and the average delay of the routes determined by the proposed QoS routing algorithm are superior to existing wireless routing algorithms. Moreover, network dynamics is examined under different levels of QoS

    Conceptualising the Role of the UAE Innovation Strategy in University-Industry knowledge Diffusion Process

    Get PDF
    Universities are considered one of the primary sources of knowledge and an essential component of the triple helix theory. They fuel the industries with the required expertise and pool of resources to operate efficiently. Moreover, entrepreneurial universities successfully contributed to regional development and employment growth by supporting entrepreneurial activities and incubation programmes. Thus, university-industry collaboration is vital for enhancing knowledge-based industries\u27 knowledge diffusion as well as the regional innovation atmospheres. On the other hand, countries and regional authorities strive to stimulate their regional development by encouraging innovation and entrepreneurship activities. For example, the UAE announced its 2015 innovation strategy that focused on seven industries: education, technology, renewable energy, transportation, education, health, water, and space. The strategy stressed the role of universities R & R&D, first-class research, and promoting incubation services as one of the country\u27s main innovation enablers. Thus, universities, scholars and industry should concentrate on the identified sectors to achieve the strategic innovation goals. This work aims to conceptualise and test the relationship and collaboration between industry and universities in the UAE and the impact of the innovation strategy on this relationship. Therefore, we critically analyse literature on the university-industry relationship and connect it with the UAE innovation strategy that resulted in a conceptual university-industry relationship model where the innovation strategy and UAE government act as a moderator of this relationship. The initial results show that the conceptual model includes research and curriculum collaboration. Research collaboration includes joint research, research fund, commercialisation of the research output, while curriculum collaboration includes the programmes and courses updates and joint training programmes. The developed model is still in its early stage of development and requires further updates based on interviews with the HEIs researchers and the survey results

    BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs)

    Get PDF
    AbstractBinary analysis is useful in many practical applications, such as the detection of malware or vulnerable software components. However, our survey of the literature shows that most existing binary analysis tools and frameworks rely on assumptions about specific compilers and compilation settings. It is well known that techniques such as refactoring and light obfuscation can significantly alter the structure of code, even for simple programs. Applying such techniques or changing the compiler and compilation settings can significantly affect the accuracy of available binary analysis tools, which severely limits their practicability, especially when applied to malware. To address these issues, we propose a novel technique that extracts the semantics of binary code in terms of both data and control flow. Our technique allows more robust binary analysis because the extracted semantics of the binary code is generally immune from light obfuscation, refactoring, and varying the compilers or compilation settings. Specifically, we apply data-flow analysis to extract the semantic flow of the registers as well as the semantic components of the control flow graph, which are then synthesized into a novel representation called the semantic flow graph (SFG). Subsequently, various properties, such as reflexive, symmetric, antisymmetric, and transitive relations, are extracted from the SFG and applied to binary analysis. We implement our system in a tool called BinGold and evaluate it against thirty binary code applications. Our evaluation shows that BinGold successfully determines the similarity between binaries, yielding results that are highly robust against light obfuscation and refactoring. In addition, we demonstrate the application of BinGold to two important binary analysis tasks: binary code authorship attribution, and the detection of clone components across program executables. The promising results suggest that BinGold can be used to enhance existing techniques, making them more robust and practical

    The Good, The Bad, and The Ugly About Insta Shopping: A Qualitative Study

    Get PDF
    Instagram, as many social media platforms, has been increasingly used by users to shop for goods and products from business or other individuals. Recently, studies have shed lights on acceptance and usage of Insta shopping from users’ perspectives by following popular technology models, such as technology acceptance model (TAM) and unified theory of acceptance and use of technology (UTAUT). However, more rich and in-depth insights about using Instagram for commercial purposes within a certain context are yet to be discovered. Therefore, this study aims at discovering experiences and interactions with Insta shopping, the factors and the drivers that impact users’ acceptance of Insta shopping, the weight of each factor (degree of consensus among participants), and their direction (positive, negative, or both). The study followed a qualitative approach, by creating four homogeneous focus groups (six participants each) of IT students in United Arab Emirates (UAE) universities. The data analysis approach considered is an axial coding technique as part of the grounded theory, which includes open coding, axial coding, and selective coding stages. The results revealed that the time factor, trust in Insta shops (and its drivers such as reviews, word of mouth, trading license, and others), distrust (and its drivers such as fake comments and reviews, extremely low prices, and others), and the associated risks (financial for losing money, security because of online payments, and some privacy issues) can impact users’ behaviors toward Insta shopping. Also, the study classified participants’ viewpoints and experiences’ themes into advantages, disadvantages, and issues that are associated with Insta shopping. The study indicated theoretical and practical implications and suggests future research directions

    Efforts and Suggestions for Improving Cybersecurity Education

    Get PDF
    In this growing technology epoch, one of the main concerns is about the cyber threats. To tackle this issue, highly skilled and motivated cybersecurity professionals are needed, who can prevent, detect, respond, or even mitigate the effect of such threats. However, the world faces workforce shortage of qualified cybersecurity professionals and practitioners. To solve this dilemma several cybersecurity educational programs have arisen. Before it was just a couple of courses in a computer science graduate program. Now a day’s different cybersecurity courses are introduced at the high school level, undergraduate computer science and information systems programs, even in the government level. Due to some peculiar nature of cybersecurity, educational institutions face many issues when designing a cybersecurity curriculum or cybersecurity activities

    DroidDetectMW: A Hybrid Intelligent Model for Android Malware Detection

    Get PDF
    Malicious apps specifically aimed at the Android platform have increased in tandem with the proliferation of mobile devices. Malware is now so carefully written that it is difficult to detect. Due to the exponential growth in malware, manual methods of malware are increasingly ineffective. Although prior writers have proposed numerous high-quality approaches, static and dynamic assessments inherently necessitate intricate procedures. The obfuscation methods used by modern malware are incredibly complex and clever. As a result, it cannot be detected using only static malware analysis. As a result, this work presents a hybrid analysis approach, partially tailored for multiple-feature data, for identifying Android malware and classifying malware families to improve Android malware detection and classification. This paper offers a hybrid method that combines static and dynamic malware analysis to give a full view of the threat. Three distinct phases make up the framework proposed in this research. Normalization and feature extraction procedures are used in the first phase of pre-processing. Both static and dynamic features undergo feature selection in the second phase. Two feature selection strategies are proposed to choose the best subset of features to use for both static and dynamic features. The third phase involves applying a newly proposed detection model to classify android apps; this model uses a neural network optimized with an improved version of HHO. Application of binary and multi-class classification is used, with binary classification for benign and malware apps and multi-class classification for detecting malware categories and families. By utilizing the features gleaned from static and dynamic malware analysis, several machine-learning methods are used for malware classification. According to the results of the experiments, the hybrid approach improves the accuracy of detection and classification of Android malware compared to the scenario when considering static and dynamic information separately

    AI in Education: Improving Quality for Both Centralized and Decentralized Frameworks

    Get PDF
    Education is essential for achieving many Sustainable Development Goals (SDGs). Therefore, the education system focuses on empowering more educated people and improving the quality of the education system. One of the latest technologies to enhance the quality of education is Artificial Intelligence (AI)-based Machine Learning (ML). As a result, ML has a significant influence on the education system. ML is currently widely applied in the education system for various tasks, such as creating models by monitoring student performance and activities that accurately predict student outcomes, their engagement in learning activities, decision-making, problem-solving capabilities, etc. In this research, we provide a survey of machine learning frameworks for both distributed (clusters of schools and universities) and centralized (university or school) educational institutions to predict the quality of students\u27 learning outcomes and find solutions to improve the quality of their education system. Additionally, this work explores the application of ML in teaching and learning for further improvements in the learning environment for centralized and distributed education systems

    BinComp: A Stratified Approach to Compiler Provenance Attribution

    Get PDF
    Compiler provenance encompasses numerous pieces of information, such as the compiler family, compiler version, optimization level, and compiler-related functions. The extraction of such information is imperative for various binary analysis applications, such as function fingerprinting, clone detection, and authorship attribution. It is thus important to develop an efficient and automated approach for extracting compiler provenance. In this study, we present BinComp, a practical approach which, analyzes the syntax, structure, and semantics of disassembled functions to extract compiler provenance. BinComp has a stratified architecture with three layers. The first layer applies a supervised compilation process to a set of known programs to model the default code transformation of compilers. The second layer employs an intersection process that disassembles functions across compiled binaries to extract statistical features (e.g., numerical values) from common compiler/linker-inserted functions. This layer labels the compiler-related functions. The third layer extracts semantic features from the labeled compiler-related functions to identify the compiler version and the optimization level. Our experimental results demonstrate that BinComp is efficient in terms of both computational resources and time

    OBA2: An Onion approach to Binary Code Authorship Attribution

    Get PDF
    A critical aspect of malware forensics is authorship analysis. The successful outcome of such analysis is usually determined by the reverse engineer’s skills and by the volume and complexity of the code under analysis. To assist reverse engineers in such a tedious and error-prone task, it is desirable to develop reliable and automated tools for supporting the practice of malware authorship attribution. In a recent work, machine learning was used to rank and select syntax-based features such as n-grams and flow graphs. The experimental results showed that the top ranked features were unique for each author, which was regarded as an evidence that those features capture the author’s programming styles. In this paper, however, we show that the uniqueness of features does not necessarily correspond to authorship. Specifically, our analysis demonstrates that many “unique” features selected using this method are clearly unrelated to the authors’ programming styles, for example, unique IDs or random but unique function names generated by the compiler; furthermore, the overall accuracy is generally unsatisfactory. Motivated by this discovery, we propose a layered Onion Approach for Binary Authorship Attribution called OBA2. The novelty of our approach lies in the three complementary layers: preprocessing, syntax-based attribution, and semantic-based attribution. Experiments show that our method produces results that not only are more accurate but have a meaningful connection to the authors’ styles
    corecore